<!DOCTYPE html>
<html lang="en">
<head>
	<meta charset="UTF-8">
	<meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1">
	<title>Simple Pattern Split Tokenizer | ElasticSearch 7.7 权威指南中文版</title>
	<meta name="keywords" content="ElasticSearch 权威指南中文版, elasticsearch 7, es7, 实时数据分析，实时数据检索" />
    <meta name="description" content="ElasticSearch 权威指南中文版, elasticsearch 7, es7, 实时数据分析，实时数据检索" />
    <!-- Give IE8 a fighting chance -->
    <!--[if lt IE 9]>
    <script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
    <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
    <![endif]-->
	<link rel="stylesheet" type="text/css" href="../static/styles.css" />
	<script>
	var _link = 'analysis-simplepatternsplit-tokenizer.html';
    </script>
</head>
<body>
<div class="main-container">
    <section id="content">
        <div class="content-wrapper">
            <section id="guide" lang="zh_cn">
                <div class="container">
                    <div class="row">
                        <div class="col-xs-12 col-sm-8 col-md-8 guide-section">
                            <div style="color:gray; word-break: break-all; font-size:12px;">原英文版地址: <a href="https://www.elastic.co/guide/en/elasticsearch/reference/7.7/analysis-simplepatternsplit-tokenizer.html" rel="nofollow" target="_blank">https://www.elastic.co/guide/en/elasticsearch/reference/7.7/analysis-simplepatternsplit-tokenizer.html</a>, 原文档版权归 www.elastic.co 所有<br/>本地英文版地址: <a href="../en/analysis-simplepatternsplit-tokenizer.html" rel="nofollow" target="_blank">../en/analysis-simplepatternsplit-tokenizer.html</a></div>
                        <!-- start body -->
                  <div class="page_header">
<strong>重要</strong>: 此版本不会发布额外的bug修复或文档更新。最新信息请参考 <a href="https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html" rel="nofollow">当前版本文档</a>。
</div>
<div id="content">
<div class="breadcrumbs">
<span class="breadcrumb-link"><a href="index.html">Elasticsearch Guide [7.7]</a></span>
»
<span class="breadcrumb-link"><a href="analysis.html">Text analysis</a></span>
»
<span class="breadcrumb-link"><a href="analysis-tokenizers.html">Tokenizer reference</a></span>
»
<span class="breadcrumb-node">Simple Pattern Split Tokenizer</span>
</div>
<div class="navheader">
<span class="prev">
<a href="analysis-simplepattern-tokenizer.html">« Simple Pattern Tokenizer</a>
</span>
<span class="next">
<a href="analysis-standard-tokenizer.html">Standard Tokenizer »</a>
</span>
</div>
<div class="section">
<div class="titlepage"><div><div>
<h2 class="title">
<a id="analysis-simplepatternsplit-tokenizer"></a>Simple Pattern Split Tokenizer<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elastic/elasticsearch/edit/7.7/docs/reference/analysis/tokenizers/simplepatternsplit-tokenizer.asciidoc">edit</a>
</h2>
</div></div></div>
<p>The <code class="literal">simple_pattern_split</code> tokenizer uses a regular expression to split the
input into terms at pattern matches. The set of regular expression features it
supports is more limited than the <a class="xref" href="analysis-pattern-tokenizer.html" title="Pattern Tokenizer"><code class="literal">pattern</code></a>
tokenizer, but the tokenization is generally faster.</p>
<p>This tokenizer does not produce terms from the matches themselves. To produce
terms from matches using patterns in the same restricted regular expression
subset, see the <a class="xref" href="analysis-simplepattern-tokenizer.html" title="Simple Pattern Tokenizer"><code class="literal">simple_pattern</code></a>
tokenizer.</p>
<p>This tokenizer uses <a href="http://lucene.apache.org/core/8_5_1/core/org/apache/lucene/util/automaton/RegExp.html" class="ulink" target="_top">Lucene regular expressions</a>.
For an explanation of the supported features and syntax, see <a class="xref" href="regexp-syntax.html" title="Regular expression syntax">Regular Expression Syntax</a>.</p>
<p>The default pattern is the empty string, which produces one term containing the
full input. This tokenizer should always be configured with a non-default
pattern.</p>
<h3>
<a id="_configuration_19"></a>Configuration<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elastic/elasticsearch/edit/7.7/docs/reference/analysis/tokenizers/simplepatternsplit-tokenizer.asciidoc">edit</a>
</h3>
<p>The <code class="literal">simple_pattern_split</code> tokenizer accepts the following parameters:</p>
<div class="informaltable">
<table border="0" cellpadding="4px">
<colgroup>
<col>
<col>
</colgroup>
<tbody valign="top">
<tr>
<td valign="top">
<p>
<code class="literal">pattern</code>
</p>
</td>
<td valign="top">
<p>
A <a href="http://lucene.apache.org/core/8_5_1/core/org/apache/lucene/util/automaton/RegExp.html" class="ulink" target="_top">Lucene regular expression</a>, defaults to the empty string.
</p>
</td>
</tr>
</tbody>
</table>
</div>
<h3>
<a id="_example_configuration_12"></a>Example configuration<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elastic/elasticsearch/edit/7.7/docs/reference/analysis/tokenizers/simplepatternsplit-tokenizer.asciidoc">edit</a>
</h3>
<p>This example configures the <code class="literal">simple_pattern_split</code> tokenizer to split the input
text on underscores.</p>
<div class="pre_wrapper lang-console">
<pre class="programlisting prettyprint lang-console">PUT my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "my_tokenizer"
        }
      },
      "tokenizer": {
        "my_tokenizer": {
          "type": "simple_pattern_split",
          "pattern": "_"
        }
      }
    }
  }
}

POST my_index/_analyze
{
  "analyzer": "my_analyzer",
  "text": "an_underscored_phrase"
}</pre>
</div>
<div class="console_widget" data-snippet="snippets/885.console"></div>
<p>The above example produces these terms:</p>
<div class="pre_wrapper lang-text">
<pre class="programlisting prettyprint lang-text">[ an, underscored, phrase ]</pre>
</div>
</div>
<div class="navfooter">
<span class="prev">
<a href="analysis-simplepattern-tokenizer.html">« Simple Pattern Tokenizer</a>
</span>
<span class="next">
<a href="analysis-standard-tokenizer.html">Standard Tokenizer »</a>
</span>
</div>
</div>

                  <!-- end body -->
                        </div>
                        <div class="col-xs-12 col-sm-4 col-md-4" id="right_col">
                        
                        </div>
                    </div>
                </div>
            </section>
        </div>
    </section>
</div>
<script src="../static/cn.js"></script>
</body>
</html>